Add a python guide which demonstrates using an LLAMA model in a compute by raksiv · Pull Request #649 · nitrictech/docs · GitHub

This repository was archived by the owner on May 20, 2025. It is now read-only.

Member

raksiv commented Oct 21, 2024

In this guide, we demonstrate how you can use a lightweight machine learning model like Llama with serverless compute. This example performs language translation using Llama-3.2-1B-Instruct-Q4_K_M

vercel bot commented Oct 21, 2024 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
nitric-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Oct 23, 2024 2:07pm

davemooreuws suggested changes

View reviewed changes

docs/guides/python/serverless-llama.mdx

+                - Nitric
+                - API
+                - AI & Machine Learning
+              languages:

Member

davemooreuws Oct 22, 2024

rebase and add start_steps. See go realtime guide for example.

Member Author

raksiv Oct 22, 2024

rebased, but start steps won't work with this repository - it requires them to download llama separately.

Member

tjholm Oct 22, 2024

Could the model be curled?

HomelessDinosaur suggested changes

View reviewed changes

docs/guides/python/serverless-llama.mdx Outdated Show resolved Hide resolved

docs/guides/python/serverless-llama.mdx Outdated Show resolved Hide resolved


          Add a guide which uses nitric and llama

d21e3e8

Demonsrate how a lightweight llama model can be used with serverless compute

raksiv force-pushed the guide/serverless-llama branch from 5a7d491 to d21e3e8 Compare

October 22, 2024 14:06

vercel bot deployed to Preview

October 22, 2024 14:08

View deployment


          Apply suggestions from code review

730e6d7

Co-authored-by: Ryan Cartwright <[email protected]>

vercel bot deployed to Preview

October 22, 2024 16:34

View deployment

tjholm reviewed

View reviewed changes

docs/guides/python/serverless-llama.mdx Outdated

Comment on lines 204 to 206

+                    # set 128MB of RAM
+                    # See lambda configuration docs here:
+                    # https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html#configuration-memory-console

Member

tjholm Oct 22, 2024

Suggested change

      
                  # set 128MB of RAM
          
                  # See lambda configuration docs here:
          
                  # https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html#configuration-memory-console
          
                  # set 6GB of RAM
          
                  # Lambda vCPUs are proportional to memory allocation. And a larger amount of CPUs will improve LLM inference

tjholm reviewed

View reviewed changes

docs/guides/python/serverless-llama.mdx Outdated

Comment on lines 208 to 210

+                    # set a timeout of 15 seconds
+                    # See lambda timeout values here:
+                    # https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html#configuration-timeout-console

Member

tjholm Oct 22, 2024

Suggested change

      
                  # set a timeout of 15 seconds
          
                  # See lambda timeout values here:
          
                  # https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html#configuration-timeout-console

tjholm reviewed

View reviewed changes

docs/guides/python/serverless-llama.mdx Outdated

Comment on lines 216 to 219

+                    # # set a provisioned concurrency value
+                    # # For info on provisioned concurrency for AWS Lambda see:
+                    # # https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html
+                    provisioned-concurrency: 0

Member

tjholm Oct 22, 2024

Suggested change

      
                  # # set a provisioned concurrency value
          
                  # # For info on provisioned concurrency for AWS Lambda see:
          
                  # # https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html
          
                  provisioned-concurrency: 0

tjholm reviewed

View reviewed changes

docs/guides/python/serverless-llama.mdx Outdated

Comment on lines 212 to 215

+                    # set the amount of ephemeral-storage: of 512MB
+                    # For info on ephemeral-storage for AWS Lambda see:
+                    # https://docs.aws.amazon.com/lambda/latest/dg/configuration-ephemeral-storage.html
+                    ephemeral-storage: 1024

Member

tjholm Oct 22, 2024

Update comment to explain why

tjholm reviewed

View reviewed changes

docs/guides/python/serverless-llama.mdx Outdated

Comment on lines 91 to 97

+                  response = llama_model(
+                      prompt=prompt,
+                      max_tokens=150,
+                      temperature=0.7,
+                      top_p=0.9,
+                      stop=["\n"]
+                  )

Member

tjholm Oct 22, 2024 •

edited

Loading

Not sure if it's worthwhile, but making these options to show off a bit more configurability could be good.

e.g.

@main.post("/translate")
async def handle_translation(ctx: HttpContext):
    # Could still leave max_tokens hardcoded to make sure prompts don't exceed 30s
    max_tokens = ctx.req.query.get("max_tokens", default_max_tokens)
    preset = ctx.req.query.get("temperature", default_temperature)
  
    text = ctx.req.json["text"]

We also support using raw text in the dashboard api testing. So not all prompts need to be wrapped in JSON

tjholm reviewed

View reviewed changes

docs/guides/python/serverless-llama.mdx Outdated

Comment on lines 271 to 275

+              ## Conclusion
+              In this guide, we demonstrated how you can use a lightweight machine learning model like Llama with serverless compute, enabling you to efficiently handle real-time translation tasks without the need for constant infrastructure management.
+              The combination of serverless architecture and on-demand model execution provides scalability, flexibility, and cost-efficiency, ensuring that resources are only consumed when necessary. This setup allows you to run lightweight models in a cloud-native way, ideal for dynamic applications requiring minimal operational overhead.

                
                    No newline at end of file

Member

tjholm Oct 22, 2024

It might be really cool to follow on from this guide with a websocket chatbot.

tjholm reviewed

View reviewed changes

docs/guides/python/serverless-llama.mdx Outdated

+              llama_model = Llama(model_path="./models/Llama-3.2-1B-Instruct-Q4_K_M.gguf")
+              # Function to perform translation using the Llama model
+              def translate_text(text):

Member

tjholm Oct 22, 2024 •

edited

Loading

I think translating text is an interesting use case, but would it also be simpler to just pass through the prompt directly from the users request and allow them to test any prompt? e.g. What is the Capital of France? Especially if the goal is to demonstrate just running these models in serverless compute?

tjholm reviewed

View reviewed changes

docs/guides/python/serverless-llama.mdx Outdated

+                - python
+              ---
+              # Using LLama models with serverless infrastructure

Member

tjholm Oct 22, 2024

A title of "Building AWS LLAMBDAS" just popped into my head, not sure if it's good as the last part is a bit hard to read :P. (I know if applied to other serverless compute as well but an opportunity for wordplay seems hard to pass up).

HomelessDinosaur reviewed

View reviewed changes

docs/guides/python/serverless-llama.mdx Outdated Show resolved Hide resolved

jyecusch added 2 commits

October 23, 2024 15:18


          add diff support to code examples

305c173


          update the guide

6d8129d

vercel bot deployed to Preview

October 23, 2024 04:21

View deployment

davemooreuws suggested changes

View reviewed changes

docs/guides/python/serverless-llama.mdx Outdated Show resolved Hide resolved

tjholm reviewed

View reviewed changes

docs/guides/python/serverless-llama.mdx

Comment on lines +228 to +229

		# We add more storage to the lambda function, so it can store the model
		ephemeral-storage: 1024

Member

tjholm Oct 23, 2024

Is this true? Isn't the model baked into the container already?


          remove unused tags

raksiv closed this

Member Author

raksiv commented Oct 23, 2024

Guide was retargeted, reviews are stale and will now cause confusion.

vercel bot deployed to Preview

October 23, 2024 14:07

View deployment

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet